Kernel Fusion for Video Retrieval Tasks

نویسنده

  • Pascal Michaillat
چکیده

In the context of the annual TRECVID challenge, this paper presents a comprehensive statistical framework for classification of video shots. We first design and analyze a broad set of language and video features. While most teams in this challenge have not been able to improve their baseline language performance by incorporating visual information, we show that a strong vision system can be a major asset. By leveraging techniques that have been developed for visual object recognition we can detect categories for which language alone does not contain any information. We then present an elegant way to integrate information from different sources into a single non-parametric classifier. We compute kernel matrices for different information sources and for a variety of features and use semi-definite programming to learn the optimal kernel matrix. This optimal kernel is a positive linear combination of the prespecified kernel matrices and is determined simultaneously with the decision boundary of the 1-norm soft-margin SVM. Our preliminary results on a reduced data set are encouraging, and suggest that our data fusion approach could outperform most of the usual classification techniques when multiple sources of information are available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

News Video Retrieval by Learning Multimodal Semantic Information

With the explosion of multimedia data especially that of video data, requirement of efficient video retrieval has becoming more and more important. Years of TREC Video Retrieval Evaluation (TRECVID) research gives benchmark for video search task. The video data in TRECVID are mainly news video. In this paper a compound model consisting of several atom search modules, i.e., textual and visual, f...

متن کامل

ITI-CERTH participation to TRECVID 2012

This paper provides an overview of the tasks submitted to TRECVID 2012 by ITI-CERTH. ITICERTH participated in the Known-item search (KIS), in the Semantic Indexing (SIN), as well as in the Event Detection in Internet Multimedia (MED) and the Multimedia Event Recounting (MER) tasks. In the SIN task, techniques are developed, which combine video representations that express motion semantics with ...

متن کامل

IBM Research TREC 2002 Video Retrieval System

In this paper, we describe the IBM Research system for analysis, indexing, and retrieval of video, which was applied to the TREC-2002 video retrieval benchmark. The system explores methods for fully-automatic content analysis, shot boundary detection, multi-modal feature extraction, statistical modeling for semantic concept detection, and speech recognition and indexing. The system supports que...

متن کامل

Learning Multi-modal Similarity

In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key cha...

متن کامل

UESTC at ImageCLEF 2012 Medical Tasks

This paper describes the methods used and results archived by our research group in the ImageCLEF 2012 medical retrieval and classification tasks. We performed three sub-tasks, ad-hoc retrieval, case-based retrieval, and modality classification. For the retrieval tasks, we combined semantic-based retrieval with traditional text-based retrieval. The semantic-based retrieval was conducted by comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006